Goto

Collaborating Authors

 multi-group mean estimation


An active learning framework for multi-group mean estimation

Neural Information Processing Systems

We consider a fundamental problem where there are multiple groups whose data distributions are unknown, and an analyst would like to learn the mean of each group. We consider an active learning framework to sequentially collect $T$ samples with bandit, each period observing a sample from a chosen group. After observing a sample, the analyst may update their estimate of the mean and variance of that group and choose the next group accordingly. The objective is to dynamically collect samples to minimize the $p$-norm of the vector of variances of our mean estimators after $T$ rounds. We propose an algorithm, Variance-UCB, that selects groups according to a an upper bound on the variance estimate adjusted to the $p$-norm chosen. We show that the regret of Variance-UCB is $O(T^{-2})$ for finite $p$, and prove that no algorithm can do better. When $p$ is infinite, we recover the $O(T^{-1.5})$


Exploration-free Algorithms for Multi-group Mean Estimation

Wei, Ziyi, Zhong, Huaiyang, Li, Xiaocheng

arXiv.org Machine Learning

We study the problem of multi-group mean estimation, where the task is to allocate a limited sampling budget across multiple groups in order to estimate their means uniformly well. This problem arises naturally in polling, survey design, marketing, and other settings where representative estimates across diverse groups are required. A key feature distinguishing this setting from classical reward-maximization bandits is that the optimal allocation requires sampling every arm on the order of Θ(T) times, rather than focusing as much as possible on the best option. This structural property suggests that explicit exploration phases are unnecessary and opens the door to exploration-free algorithms. Contextual information makes the problem even more relevant in real-world applications such as healthcare (Bastani and Bayati, 2020; Du et al., 2024), recommendation systems (Agarwal et al., 2009; Li et al., 2010), and dynamic pricing (Qiang and Bayati, 2016; Ban and Keskin, 2021), where side information fundamentally shapes the reward distributions and motivates the estimation of context-dependent group parameters. Accurate estimation in this richer setting is crucial for interpretable personalization, robust policy design, and fairness considerations.


An active learning framework for multi-group mean estimation

Neural Information Processing Systems

We consider a fundamental problem where there are multiple groups whose data distributions are unknown, and an analyst would like to learn the mean of each group. We consider an active learning framework to sequentially collect T samples with bandit, each period observing a sample from a chosen group. After observing a sample, the analyst may update their estimate of the mean and variance of that group and choose the next group accordingly. The objective is to dynamically collect samples to minimize the p -norm of the vector of variances of our mean estimators after T rounds. We propose an algorithm, Variance-UCB, that selects groups according to a an upper bound on the variance estimate adjusted to the p -norm chosen.